Supplementary: Simultaneously Learning DNA Motif along with Its Position and Sequence Rank Preferences through EM Algorithm
نویسندگان
چکیده
منابع مشابه
Simultaneously Learning DNA Motif along with Its Position and Sequence Rank Preferences through EM Algorithm
Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e. position preference and sequence rank preference). This information is usually required from the user. This paper presents a de novo motif dis...
متن کاملRankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors
MOTIVATION The sequence specificity of DNA-binding proteins is typically represented as a position weight matrix in which each base position contributes independently to relative affinity. Assessment of the accuracy and broad applicability of this representation has been limited by the lack of extensive DNA-binding data. However, new microarray techniques, in which preferences for all possible ...
متن کاملW-AlignACE: an improved Gibbs sampling algorithm based on more accurate position weight matrices learned from sequence and gene expression/ChIP-chip data
MOTIVATION Position weight matrices (PWMs) are widely used to depict the DNA binding preferences of transcription factors (TFs) in computational molecular biology and regulatory genomics. Thus, learning an accurate PWM to characterize the binding sites of a specific TF is a fundamental problem that plays an important role in modeling regulatory motifs and also in discovering the regulatory targ...
متن کاملDeep and wide digging for binding motifs in ChIP-Seq data
SUMMARY ChIP-Seq data are a new challenge for motif discovery. Such a data typically consists of thousands of DNA segments with base-specific coverage values. We present a new version of our DNA motif discovery software ChIPMunk adapted for ChIP-Seq data. ChIPMunk is an iterative algorithm that combines greedy optimization with bootstrapping and uses coverage profiles as motif positional prefer...
متن کاملHIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences
MOTIVATION Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local ...
متن کامل